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Abstract. We propose SplInter, a new technique for proving proper¬ 
ties of heap-manipulating programs that marries (1) a new separation 
logic-based analysis for heap reasoning with (2) an interpolation-based 
technique for refining heap-shape invariants with data invariants. SplIn¬ 
ter is property directed, precise, and produces counterexample traces 
when a property does not hold. Using the novel notion of spatial in¬ 
terpolants modulo theories, SplInter can infer complex invariants over 
general recursive predicates, e.g., of the form all elements in a linked list 
are even or a binary tree is sorted. Furthermore, we treat interpolation 
as a black box, which gives us the freedom to encode data manipulation 
in any suitable theory for a given program (e.g., bit vectors, arrays, or 
linear arithmetic), so that our technique immediately benefits from any 
future advances in SMT solving and interpolation. 


1 Introduction 

Since the problem of determining whether a program satisfies a given property 
is undecidable, every verification algorithm must make some compromise. There 
are two classical schools of program verification, which differ in the compromise 
they make: the static analysis school gives up refutation soundness (i.e., may 
report false positives); and the software model checking school gives up the guar¬ 
antee of termination. In the world of integer program verification, both schools 
are well explored and enjoy cross-fertilization of ideas: each has its own strengths 
and uses in different contexts. In the world of heap-manipulating programs, the 
static analysis school is well-attended [iniiiiiiiiss], while the software model 
checking school has remained essentially vacant. This paper initiates a program 
to rectify this situation, by proposing one of the first path-based software model 
checking algorithms for proving combined shape-and-data properties. 

The algorithm we propose, SplInter, marries two celebrated program veri¬ 
fication ideas: McMillan’s lazy abstraction with interpolants (Impact) algorithm 
for software model checking |25j . and separation logic, a program logic for rea¬ 
soning about shape properties [32]. SplInter (like Impact) is based on a path¬ 
sampling methodology: given a program P and safety property g), SplInter 
constructs a proof that P is memory safe and satisfies (/? by sampling a finite 
number of paths through the control-flow graph of P, proving them safe, and 
then assembling proofs for each sample path into a proof for the whole program. 
The key technical advance which enables SplInter is an algorithm for spatial 
interpolation, which is used to construct proofs in separation logic for the sample 


traces (serving the same function as Craig interpolation for first-order logic in 
Impact). 

SplInter is able to prove properties requiring integrated heap and data 
(e.g., integer) reasoning by strengthening separation logic proofs with data re¬ 
finements produced by classical Craig interpolation, using a technique we call 
spatial interpolation modulo theories. Data refinements are not tied to a specific 
logical theory, giving us a rather generic algorithm and freedom to choose an 
appropriate theory to encode a program’s data. 

Fig .[^summarizes the high-level operation of our algorithm. Given a program 
with no heap manipulation, SplInter only computes theory interpolants and 
behaves exactly like Impact, and thus one can thus view SplInter as a proper 
extension of Impact to heap manipulating programs. At the other extreme, 
given a program with no data manipulation, SplInter is a new shape analysis 
that uses path-based relaxation to construct memory safety proofs in separation 
logic. 

There is a great deal of work in the static analysis school on shape analysis 
and on combined shape-and-data analysis, which we will discuss further in Sec.[^ 
We do not claim superiority over these techniques (which have had the benefit 
of 20 years of active development). SplInter, as the first member of the soft¬ 
ware model checking school, is not better; however, it is fundamentally different. 
Nonetheless, we will mention two of the features of SplInter (not enjoyed by 
any previous verification algorithm for shape-and-data properties) that make our 
approach worthy of exploration: path-based refinement and property-direction. 

— Path-based refinement: This supports a progress guarantee by tightly cor¬ 
relating program exploration with refinement, and by avoiding imprecision 
due to lossy join and widening operations employed by abstract domains. 
SplInter does not report false positives, and produces counterexamples for 
violated properties. This comes, as usual, at the price of potential divergence. 

— Property-direction: Rather than seeking the strongest invariant possible, we 
compute one that is just strong enough to prove that a desired property 
holds. Property direction enables scalable reasoning in rich program logics 
like the one described in this paper, which combines separation logic with 
first-order data refinements. 

We have implemented an instantiation of our generic technique in the T2 
verification tool and used it to prove correctness of a number of programs, 
partly drawn from open source software, requiring combined data and heap 
invariants. Our results indicate the usability and promise of our approach. 

Contributions We summarize our contributions as follows: 

1. A generic property-directed algorithm for verifying and falsifying safety of 
programs with heap and data manipulation. 

2. A precise and expressive separation logic analysis for computing memory 
safety proofs of program paths using a novel technique we term spatial in¬ 
terpolation. 

3. A novel interpolation-based technique for strengthening separation logic 
proofs with data refinements. 
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Fig. 1. Overview of SplInter verification algorithm. 


4. An implementation and an evaluation of our technique for a fragment of 
separation logic with linked lists enriched with linear arithmetic refinements. 


2 : 


int i 
node* 
while 


= nondet(); 
X = null; 

(i != 0) 


2 Overview 

In this section, we demonstrate the operation of SplInter (Fig. on the 
simple linked list example shown in Fig. We assume that integers are un¬ 
bounded (i.e., integer values are drawn from Z rather than machine integers) and 
that there is a struct called node denoting a 
linked list node, with a next pointer N and an in¬ 
teger (data) element D. The function nondet {) re¬ 
turns a nondeterministic integer value. This pro¬ 
gram starts by building a linked list in the loop 
on location 2. The loop terminates if the initial 
value of i is ^ 0, in which case a linked list of 
size i is constructed, where data elements D of 
list nodes range from 1 to i. Then, the loop at 
location 3 iterates through the linked list assert¬ 
ing that the data element of each node in the list 
is ^ 0. Our goal is to prove that the assertion at 
location 4 is never violated. 


node* Imp 
tnip->N = x; 
tmp->D = i; 

X = Imp; 

i- -; 

while (x != null) 
assert{x->D >= 0) 
X = x->N; 


malloc(node) ; 


Fig. 2. Illustrative Example 


Sample a Program Path To start, we need a path tt through the program to 
the assertion at location 4. Suppose we start by sampling the path 1,2,2,3,4, 
that is, the path that goes through the first loop once, and enters the second loop 
arriving at the assertion. This path is illustrated in Fig. (where 2a indicates 
the second occurrence of location 2). Our goal is to construct a Hoare-style 
proof of this path: an annotation of each location along the path with a formula 
describing reachable states, such that location 4 is annotated with a formula 
implying that x->D >= 0. This goal is accomplished in two phases. First, we use 
spatial interpolation to compute a memory safety proof for the path tt (Fig.|^b)). 
Second, we use theory refinement to strengthen the memory safety proof and 
establish that the path satisfies the post-condition x->D >= 0 (Fig. ic)). 












Compute Spatial Interpolants The first step in constructing the proof is to 
find spatial interpolants: a sequence of separation logic formulas approximating 
the shape of the heap at each program location, and forming a Hoare-style 
memory safety proof of the path. Our spatial interpolation procedure is a two 
step process that first symbolically executes the path in a forward pass and then 
derives a weaker proof using a backward pass. The backward pass can be thought 
of as an under-approximate weakest precondition computation, which uses the 
symbolic heap from the forward pass to guide the under-approximation. 

We start by showing the symbolic heaps in Fig. [^a), which are the result 
of the forward pass obtained by symbolically executing only heap statements 
along this program path (i.e., the strongest postcondition along the path). The 
separation logic annotations in Fig. [^follow standard notation (e.g., m), where 
a formula is of the form U : S, where 7T is a Boolean first-order formula over 
heap variables (pointers) as well as data variables (e.g., x = null or i > 0), 
and is a spatial conjunction of heaplets (e.g., emp, denoting the empty heap, 
or Z{x,y), a recursive predicate, e.g., that denotes a linked list between x and 
y). For the purposes of this example, we assume a recursive predicate ls(a:,y) 
that describes linked lists. In our example, the symbolic heap at location 2a is 
true : x >-)■ [d', null], where the heap consists of a node, pointed to by variable x, 
with null in the N field and the (implicitly existentially quantified) variable d' in 
the D field (since so far we are only interested in heap shape and not data). 

The symbolic heaps determine a memory safety proof of the path, but it 
is too strong and would likely not generalize to other paths. The goal of spa¬ 
tial interpolation is to find a sequence of annotations that are weaker than the 
symbolic heaps, but that still prove memory safety of the path. A sequence of 
spatial interpolants is shown in Fig. |^b). Note that all spatial interpolants are 
implicitly spatially conjoined with true; for clarity, we avoid explicitly conjoin¬ 
ing formulas with true in the figure. For example, location 2 is annotated with 
true : ls(a:, null) *true, indicating that there is a list on the heap, as well as other 
potential objects not required to show memory safety. We compute spatial inter¬ 
polants by going backwards along the path and asking questions of the form: how 
much can we weaken the symbolic heap while still maintaining memory safety? 
We will describe how to answer such questions in Section]^ 

Refine with Theory Interpolants Spatial interpolants give us a memory 
safety proof as an approximate heap shape at each location. Our goal now is to 
strengthen these heap shapes with data refinements, in order to prove that the 
assertion at the end of the path is not violated. To do so, we generate a system 
of Horn clause constraints from the path in some first-order theory admitting 
interpolation (e.g., linear arithmetic). These Horn clauses carefully encode the 
path’s data manipulation along with the spatial interpolants, which tell us heap 
shape at each location along the path. A solution of this constraint system, which 
can be solved using off-the-shelf interpolant generation techniques (e.g., [MlEl])) 
is a refinement (strengthening) of the memory safety proof. 

In this example, we encode program operations over integers in the theory 
of linear integer arithmetic, and use Craig interpolants to solve the system of 


int i = nondetO 
.node* X = null 


assume(i != 0); 
node* tmp = ... 
tmp->N = x; 
tmp->D = i; 

■NX = tmp; i— 


assume(i == 0) assume(x != null) assert(x->D >= 0) 




Q nuuc-' A - null , -^ 

-- -(±> 


■o 

true : x i—^ [d , null 


true : emp x = null : emp true : x [d'. null] true : x ^ [d', null] 

true : emp true : ls(x, null) true : ls(rr, null) true : ls(a;, null) true : x i-4 [d',n] 

true : emp true : \s{{Xv'm ^ i),x.nu\[) true : \s{{Xu.i' ^ i),x.nu\\) true : \s{{Xv.u ^ 0),x.null) d' ^ 0 : x ^ [d',n] 


Fig. 3. Path through program in Fig.[^ annotated with (a) results of forward symbolic 
execution, (b) spatial interpolants, and (c) spatial(T) interpolants, where T is linear 
integer arithmetic. Arrows ^ indicate implication (entailment) direction. 


constraints. A solution of this system is a set of linear arithmetic formulas that 
refine our spatial interpolants and, as a result, imply the assertion we want to 
prove holds. One possible solution is shown in Fig. j^c). For example, location 
2a is now labeled with true : ls((Az^.i^ ^ null), where the green parts of the 
formula are those added by refinement. Specifically, after refinement, we know 
that all elements in the list from x to null after the first loop have data values 
greater than or equal to i, as indicated by the predicate {\v.v ^ i). (In Section]^ 
we formalize recursive predicates with data refinements.) 

Location 4 is now annotated with d! ^ Q : x ^ [d!,n'] * true, which implies 
that x->D >= 0, thus proving that the path satisfies the assertion. 

Prom Proofs of Paths to Proofs of Programs We go from proofs of paths 
to whole program proofs implicitly by building an abstract reachability tree as in 
Impact To give a flavour for how this works, consider that the assertions 
at 2 and 2a are identical: this implies that this assertion is an inductive invariant 
at line 2. Since this assertion also happens to be strong enough to prove safety of 
the program, we need not sample any longer unrollings of the first loop. However, 
since we have not established the inductiveness of the assertion at 3, the proof is 
not yet complete and more traces need to be explored (in fact, exploring one more 
trace will do: consider the trace that unrolls the second loop once and shows that 
the second time 3 is visited can also be labeled with true : Is((Ap.p ^ 0), x, null)). 

Since our high-level algorithm is virtually the same as Impact [25], we will 
not describe it further in the paper. For the remainder of this paper, we will 
concentrate on the novel contribution of our algorithm: computing spatial inter¬ 
polants with theory refinements for program paths. 

3 Preliminaries 

3.1 Separation Logic 

We define RSep, a fragment of separation logic formulas featuring points-to pred¬ 
icates and general recursive predicates refined by theory propositions. 







x,y G HVar 
a,b G DVar 
A G DTerm 
ip G DFormula 
Z e RPred 
0 G Refinement 
X C Var 


(Heap variables) 
(Data variables) 
(Data terms) 
(Data formulas) 
(Rec. predicates) 
= \a.ip 
- X \ a 


E,F G HTerm 

JE 

n G Pure 

H G Heaplet 
S G Spatial 
P G RSep 


null I X 
A I E 

true \ E = E\ E^E\ 
ip I i7 A i7 

true I emp | E i—t [A, E] \ Z{9,E) 

H\H*E 

{3X. n : E) 


Fig. 4. Syntax of RSep formulas. 


Fig.i defines the syntax of RSep formulas. In comparison with the stan¬ 
dard list fragment used in separation logic analyses (e.g., [211131117]), the dif¬ 
ferentiating features of RSep are: (1) General recursive predicates, for describing 
unbounded pointer structures like lists, trees, etc. (2) Recursive predicates are 
augmented with a vector of refinements, which are used to constrain the data 
values appearing on the data structure defined by the predicate, detailed below. 
(3) Each heap cell (points-to predicate), E i-f [A,E], is a record consisting of 
data fields (a vector A of DTerm) followed by heap fields (a vector E of HTerm). 
(Notationally, we will use di to refer to the zth element of the vector d, and 
d[t/di] to refer to the vector d with the fth element modified to t.) (4) Pure 
formulas contain heap and first-order data constraints. 

Our definition is (implicitly) parameterized by a first-order theory T. DVar 
denotes the set of theory variables, which we assume to be disjoint from HVar 
(the set of heap variables). DTerm and DFormula denote the sets of theory terms 
and formulas, and we assume that heap variables do not appear in theory terms. 

For an RSep formula P, Var(P) denotes its free (data and heap) variables. We 
treat a Spatial formula if as a multiset of heaplets, and consider formulas to be 
equal when they are equal as multisets. For RSep formulas P = (3Xp. lip : Sp) 
and Q = (3Xq. IIq : Eq), we write P * Q to denote the RSep formula 

P * Q = (3Xp U Xq. Up A IIq : Ep * Eq) 

assuming that Xp is disjoint from Var(Q) and Xq is disjoint from Var(P) (if 
not, then Xp and Xq are first suitably renamed). For a set of variables X, we 
write (3X. P) to denote the RSep formula 

{3X. P) = {3X U Xp. Tip : Ep) 

Recursive predicates Each recursive predicate Z G RPred is associated with 
a definition that describes how the predicate is unfolded. Before we formalize 
these definitions, we will give some examples. 

The definition of the list segment predicate from Sec. |^is: 

ls(ii, x,y) = {x = y : emp) V 

(3c?, n'. X y f\ R{d) : x t-A [d, n'] * ls(i?, n', y)) 

In the above, i? is a refinement variable, which may be instantiated to a concrete 
refinement 9 G Refinement. For example, ls((Aa.a ^ 0),x,y) indicates that there 
is a list from x to y where every element of the list is at least 0. 








A refined binary tree predicate is a more complicated example: 
bt((3, L, R, x) = {x = null : emp) 

V (3d, I, r. Q{d) : x [d, I, r] 

* bt((Aa.(5(a) A L{d, a)), L, R, 1) 

* bt((Aa.Q(o) A i?(d, a)),L, R, r)) 

This predicate has three refinement variables: a unary refinement Q (which must 
be satisfied by every node in the tree), a binary refinement L (which is a relation 
that must hold between every node and its descendants to the left), and a binary 
refinement R (which is a relation that must hold between every node and its 
descendants to the right). For example, 

bt((Aa.frMe), (Aa, b.a ^ b), (Aa, b.a ^ b), x) 

indicates that x is the root of a binary search tree, and 

bt((Aa.a ^ 0), (Aa, b.a ^ b), (Aa, b.a ^ b), x) 

indicates that x is the root of a binary min-heap with non-negative elements. 

To formalize these definitions, we first define refinement terms and refined 
formulas: a refinement term t is either (1) a refinement variable R or (2) an 
abstraction (Aai,... ,a„.^), where is a refined formula. A refined formula is 
a conjunction where each conjunct is either a data formula (DFormula) or the 
application t{A) of a refinement term to a vector of data terms (DTerm). 

A predicate definition has the form 

Z{R, x) = (3Ai. Ill A : Ai) V • • • V (3A„. 77„ A : A„) 

where i? is a vector of refinement variables, a: is a vector of heap variables, and 
where refinement terms may appear as refinements in the spatial formulas A^. 
We refer to the disjuncts of the above formula as the cases for Z, and define 
cases{Z{R, x)) to be the set of cases of Z. R and x are bound in cases{Z{R, x)), 
and we will assume that predicate definitions are closed, that is, for each case of 
Z, the free rehnement variables belong to R, the free heap variables belong to x, 
and there are no free data variables. We also assume that they are well-typed in 
the sense that each refinement term r is associated with an arity, and whenever 
t{A) appears in a definition, the length of A is the arity of r. 

Semantics The semantics of our logic, defined by a satisfaction relation s,h \= 
Q, is essentially standard. Each predicate Z G RPred is defined to be the least 
solutiorQto the following equivalence: 

s,h\=Z{9,E) 3P G cases{Z{R,x)). s,h \= P[0/R,E/x] 

Note that when substituting a A-abstraction for a refinement variable, we implic¬ 
itly /3-reduce resulting applications. For example, i?(6)[(Aa.a ^ 0)/i?] = 6 > 0. 

Semantic entailment is denoted by P Q, and provable entailment by P h 
Q. When referring to a proof that P h Q, we will mean a sequent calculus proof. 

^ Our definition does not preclude ill-founded predicates; such predicates are simply 
unsatisfiable, and do not affect the technical development in the rest of the paper. 



3.2 Programs 

A program P is a tuple {V, E, Vi, Ve), where 

— is a set of control locations, with a distinguished entry node Vi € V and 
error (exit) node Vg G V, and 

— if C X y is a set of directed edges, where each e G if is associated with a 
program command e'^. 

We impose the restriction that all nodes V \ {ui} are reachable from u, via 
if, and all nodes can reach Vg. The syntax for program commands appears be¬ 
low. Note that the allocation command creates a record with n data fields, 
Di,..., Dn, and m heap fields, A^i,..., N^- To access the ith data field of a 
record pointed to by x, we use x->Di (and similarly for heap fields). We assume 
that programs are well-typed, but not necessarily memory safe. 

Assignment: X := /E Assumption: assume(il) Allocation: x := new{n,m) 

Heap store: x->Ni := E Data store: x->Di := A Disposal: free{x) 

Heap load: y := x->Ni Data load: y := x->Di 

As is standard, we compile assert commands to reachability of Vg. 

4 Spatial Interpolants 

In this section, we first define the notion of spatial path interpolants, which 
serve as memory safety proofs of program paths. We then describe a technique 
for computing spatial path interpolants. This algorithm has two phases: the first 
is a (forwards) symbolic execution phase, which computes the strongest memory 
safety proof for a path; the second is a (backwards) interpolation phase, which 
weakens the proof so that it is more likely to generalize. 

Spatial path interpolants are bounded from below by the strongest memory 
safety proof, and (implicitly) from above by the weakest memory safety proof. 
Prior to considering the generation of inductive invariants using spatial path 
interpolants, consider what could be done with only one of the bounds, in gen¬ 
eral, with either a path-based approach or an iterative fixed-point computation. 
Without the upper bound, an interpolant or invariant could be computed using 
a standard forward transformer and widening. But this suffers from the usual 
problem of potentially widening too aggressively to prove the remainder of the 
path, necessitating the design of analyses which widen conservatively at the price 
of computing unnecessarily strong proofs. The upper bound neatly captures the 
information that must be preserved for the future execution to be proved safe. 
On the other hand, without the lower bound, an interpolant or invariant could be 
computed using a backward transformer (and lower widening). But this suffers 
from the usual problem that backward transformers in shape analysis explode, 
due to issues such as not knowing the aliasing relationship in the pre-state. The 
lower bound neatly captures such information, heavily reducing the potential for 
explosion. These advantages come at the price of operating over full paths from 
entry to error. Compared to a forwards iterative analysis, operating over full 
paths has the advantage of having information about the execution’s past and 
future when weakening at each point along the path. A forwards iterative anal- 



exec(x :=new{fc,Z), {3X. 11 ■. U)) = {3X U {x' ,d,n}. {11 : S)[x'/x] * x [d,n]) 

where x', d, ft are fresh, d = (di, ..., dk), and ft = (m, ..., ni). 
exec(f ree{x), {3X. U ■. E * z [d, ft]) = (3X. 77 A 77^ : E) 

where 77 ■. E * z \d,n\\- x = z and 77^ is the 
conjunction of all disequalities x ^ y s.t y [_, _] G E. 

exec(x := E, (3X. H : E)) = (3X U {x'}. (x = E[x'/x]) * (77 : E)[x'/x\) 

where x' is fresh. 

exec(assume(77'), (3X. 77 : T)) = (3X. U AlT-. E) . 

exec(x->Ni := E, (3X. II : E * z [d, n])) = (3X. II : E * x [d, n[E/ni\]) 

where i ^ |n| and II •. E * z [d, ft] \- x = z . 
exec(y := x->Ni, {3X. II : E * z [d, ft])) = 

(3X U {y'}. {y = ni[y'ly])* {U : E * z ^ [d, ft])[y' / y]) 
where i ^ |n| and 11 : E * z [d, ft] \- x = z, and y' is fresh. 

Fig. 5. Symbolic execution for heap statements. Data statements are treated as skips. 


ysis, on the other hand, trades the information about the future for information 
about many past executions through the use of join or widening operations. 

The development in this section is purely spatial: we do not make use of 
data variables or refinements in recursive predicates. Our algorithm is thus of 
independent interest, outside of its context in this paper. We use Sep to refer 
to the fragment of RSep in which the only data formula (appearing in pure 
assertions and in refinements) is true (this fragment is equivalent to classical 
separation logic). An RSep formula P, in particular including those in recursive 
predicate definitions, determines a Sep formula P obtained by replacing all re¬ 
finements (both variables and A-abstractions) with (Xa.true) and all DFormulas 
in the pure part of P with true. Since recursive predicates, refinements, and 
DFormulas appear only positively, P is no stronger than any refinement of P. 
Since all refinements in Sep are trivial, we will omit them from the syntax (e.g., 
we will write Z{E) rather than Z{{Xa.true),E)). 

4.1 Definition 

We define a symbolic heap to be a Sep formula where the spatial part is a *- 
conjunction of points-to heaplets and the pure part is a conjunction of pointer 
(dis)equalities. Given a command c and a symbolic heap S', we use exec(c, S) to 
denote the symbolic heap that results from symbolically executing c starting in 
S (the definition of exec is essentially standard [3], and is shown in Fig.[^. 

Given a program path tt = ei,..., e^, we obtain its strongest memory safety 
proof by symbolically executing tt starting from the empty heap emp. We call this 
sequence of symbolic heaps the symbolic execution sequence of tt, and say that a 
path TT is memory-feasible if every formula in its symbolic execution sequence is 
consistent. The following proposition justifies calling this sequence the strongest 
memory safety proof. 


Proposition 1. For a path n, if the symbolic execution sequence for tt is de¬ 
fined, then TT is memory safe. If tt is memory safe and memory-feasible, then its 
symbolic execution sequence is defined. 

Recall that our strategy for proving program correctness is based on sampling 
and proving the correctness of several program paths (a la Impact [5S]). The 
problem with strongest memory safety proofs is that they do not generalize well 
(i.e., do not generate inductive invariants). 

One solution to this problem is to take advantage of property direction. Given 
a desired postcondition P and a (memory-safe and -feasible) path tt, the goal is 
to come up with a proof that is weaker than tt’s symbolic execution sequence, 
but still strong enough to show that P holds after executing tt. Coming up with 
such “weak” proofs is how traditional path interpolation is used in Impact. In 
light of this, we define spatial path interpolants as follows: 

Definition 1 (Spatial path interpolant). Let tt = ei,...,e„ be a program 
path with symbolic execution sequence Sq, ..., Sn, and let P be a Sep formula 
(such that Sn \= P)- A spatial path interpolant for tt is a sequence Iq, ... ,In of 
Sep formulas such that 

— for each i G [0,n], Si ^ li; 

— for each i G [l,u], e“ {Ii\ is a valid triple in separation logic; and 

— InhP ■ 

Our algorithm for computing spatial path interpolants is a backwards prop¬ 
agation algorithm that employs a spatial interpolation procedure at each back¬ 
wards step. Spatial interpolants for a single command are defined as: 

Definition 2 (Spatial interpolant). Given Sep formulas S and!' and a com¬ 
mand c such that exec(c, S) |= I', a spatial interpolant (for S, c, and I') is a 
Sep formula I such that S' ^ / and {/} c {!'} is valid. 

Before describing the spatial interpolation algorithm, we briefly describe how 
spatial interpolation is used to compute path interpolants. Let us use itp(S, c, I) 
to denote a spatial interpolant for S, c,/, as defined above. Let tt = Ci, ... ,6^ 
be a program path and let P be a Sep formula. First, symbolically execute tt 
to compute a sequence So,..., S„. Suppose that S„ h P. Then we compute a 
sequence lo,...,!^ by taking In = P and (for k < n) h = itp(Sfc, 4+i). 
The sequence Iq, ..., is clearly a spatial path interpolant. 

4.2 Bounded Abduction 

Our algorithm for spatial interpolation is based on an abduction procedure. 
Abduction refers to the inference of explanatory hypotheses from observations 
(in contrast to deduction, which derives conclusions from given hypotheses). The 
variant of abduction we employ in this paper, which we call bounded abduction, is 
simultaneously a form of abductive and deductive reasoning. Seen as a variant of 
abduction, bounded abduction adds a constraint that the abduced hypothesis be 
at least weak enough to be derivable from a given hypothesis. Seen as a variant 
of deduction, bounded abduction adds a constraint that the deduced conclusion 


be at least strong enough to imply some desired conclusion. Formally, we define 
bounded abduction as follows: 

Definition 3 (Bounded abduction). Let L, M, R be Sep formulas, and let X 
be a set of variables. A solution to the bounded abduction problem 

L h (3.Y. M * [ ]) h i? 

is a Sep formula A such that L ^ {3X. M * A) ^ ii. 

Note how, in contrast to bi-abduction m where a solution is a pair of formulas, 
one constrained from above and one from below, a solution to bounded abduction 
problems is a single formula that is simultaneously constrained from above and 
below. The fixed lower and upper bounds in our formulation of abduction give 
considerable guidance to solvers, in contrast to bi-abduction, where the bounds 
are part of the solution. 

Sec. [^presents our bounded abduction algorithm. For the remainder of this 
section, we will treat bounded abduction as a black box, and use L h (3X. M * 
[A]) h i? to denote that A is a solution to the bounded abduction problem. 

4.3 Computing Spatial Interpolants 

We now proceed to describe our algorithm for spatial interpolation. Given a com¬ 
mand c and Sep formulas S and I' such that exec(c, S) h this algorithm must 
compute a Sep formula itp(S', c,/') that satisfies the conditions of Definition 
Several examples illustrating this procedure are given in Fig. 

This algorithm is defined by cases based on the command c. We present the 
cases for the spatial commands; the corresponding data commands are similar. 

Allocate Suppose c is x := new(n,m). We take itp(S', c,/') = {3x. A), where 
A is obtained as a solution to exec(c. S') h {3a, z. x >->■ [a, i] * [A]) h J', and a 
and z are vectors of fresh variables of length n and m, respectively. 

Deallocate Suppose c is free(x). We take itp(S, c,/') = (3a, 5*. I' * x 

where a and z are vectors of fresh variables whose lengths are determined by the 

unique heap cell which is allocated to x in S. 

Assignment Suppose c is x := E. We take itp(S, c,/') = r[E/x]. 

Store Suppose c is x->Ni := E. We take itp(S, c,/') = {3a, z. A*x ^ [a, ^), 
where A is obtained as a solution to exec(c, S) h (3a, z.xi-^ [a, z[i?/zi]]*[A]) h I' 
and where a and z are vectors of fresh variables whose lengths are determined 
by the unique heap cell which is allocated to x in S. 

Example 1. Suppose that S is t ^ [4, y, null] * x ^ [2, null, null] where the cells 
have one data and two pointer fields, c is t->No := x, and /' is ht{t). Then we 
can compute exec(c,S') = t i— >■ [4, x, null] * x i— >■ [2, null, null], and then solve the 
bounded abduction problem 

exec(c, S) h (3a, zi. 1[a,x,zi\*[])\~ l' . 

One possible solution is A = bt(x) * bt( 2 :i), which yields 

\tp{S,c,I') = {3a,ZQ,zi. t^ [a,Zo,Zi] * bt(zi) * bt(x)) . 


J 


Load Suppose c is y : = x->Ni. Suppose that a and z are vectors of fresh variables 
of lengths |yl| and |i?| where S is of the form U S ^ [A, E\ and U : 
S *w E]\- X = w (this is the condition under which exec(c, S) is defined, 

see Fig. Let y' be a fresh variable, and define S = {y = Zi[y'/y]) * (77 : 

S * w 1 -^ [a, ^)[y'/y]. Note that S h (3y'. S) = exec(c, S) h I'. 

We take itp(5', c,7') = (3d, z. A[zi/y,y/y’] *x ^ [Si-z]) where A is obtained 
as a solution to S' h (3d, z. x[y'/y] [d, z| * [A]) h 7'. 

Example 2. Suppose that S is y = t : y [1, null,x] !->■ [5, null, null], c is y : = 
y->Ni, and I' is y ^ null : bt(t). Then S is 

y = X A y' = t : y' 1 -^ [l,null,a:]*a;i->' [5,null,null] 

We can then solve the bounded abduction problem 

S h (3a, Zo, zi. y' [a, zq, zi] * [ ]) F 7' 

A possible solution is y ^ null Ay' = t : bt(zo) * bt(zi), yielding 

itp(S,c,7') = (3a,zo,zi.zi nullAy = t : bt(zo)*bt(zi)*y H> [a,zo,zi]) . j 

Assumptions The interpolation rules defined up to this point cannot introduce 
recursive predicates, in the sense that if I' is a *-conjunction of points-to pred¬ 
icates then so is itp(S, c,7' )0 A *-conjunction of points-to predicates is exact 
in the sense that it gives the full layout of some part of the heap. The power 
of recursive predicates lies in their ability to be abstract rather than exact, and 
describe only the shape of the heap rather than its exact layout. It is a special 
circumstance that {P} c {7'} holds when I' is exact in this sense and P is not: 
intuitively, it means that by executing c we somehow gain information about 
the program state, which is precisely the case for assume commands. 

For an example of how spatial interpolation can introduce a recursive predi¬ 
cate at an assume command, consider the problem of computing an interpolant 

itp(S', assume(x y7 null), (3a, z. x [a, z] * true)) 

where S = x >—>■ [d,y] * y i-A [d' , null]: a desirable interpolant may be ls(a:, null) * 
true. The disequality introduced by the assumption ensures that one of the cases 
of the recursive predicate ls(a:, null) (where the list from x to null is empty) is 
impossible, which implies that the other case (where x is allocated) must hold. 

Towards this end, we now define an auxiliary function intro which we will use 
to introduce recursive predicates for the assume interpolation rules. Let P, Q be 
Sep formulas such that P A Q, let Z he a recursive predicate and 7? be a vector 
of heap terms. We dehne intro(Z, E, P, Q) as follows: if P h (30. Z{E) * [A]) h Q 
has a solution and AZ Q, define intro(Z, E, P, Q) = Z{E) * A. Otherwise, define 
intro(Z, E, P, Q) = Q. 

Intuitively, the abduction problem has a solution when P implies Z{E) and 
Z{E) can be excised from Q. The condition A Z Q is used to ensure that the 


^ But if I' does contain recursive predicates, then itp(S', c, I') may also. 



excision from Q is non-trivial (i.e., the part of the heap that satisfies Z{E) 
“consumes” some heaplet of Q)- 

To define the interpolation rule for assumptions, suppose c is assume (E ^ 
F) (the case of equality assumptions is similar). Letting {(^i, be an 
enumeration of the (finitely many) possible choices of Z and E, we define a 
formula M to be the result of applying intro to I' over all possible choices of Z 
and E: 

M = \ntro{Zi,Ei,S A E ^ F, intro(Z2, E2,S A E ^ F,...)) 

where the innermost occurrence of intro in this definition is intro(Z„, En, SAE ^ 
F,I'). Since intro preserves entailment (in the sense that if P h Q then P h 
intro(Z, E, P, Q)), we have that SAE ^ F A M. From a proof of SAE ^ F \- M, 
we can construct a formula M' which is entailed by S and differs from M only 
in that it renames variables and exposes additional equalities and disequalities 
implied by S', and take itp(S, c,/') to be this M'. 

The construction of M' from M is straightforward but tedious. The procedure 
is detailed in Appendix fD| ; here, we will just give an example to give intuition 
on why it is necessary. Suppose that S is x = w : y i-A z and I' is ls(w, z), and 
c is assume(x = y). Since there is no opportunity to introduce new recursive 
predicates in P, M is simply ls(i(;, z). However, M is not a valid interpolant 
since S ^ M, so we must expose the equality x = w and rename ui to in the 
list segment in M' = x = w : \s{y, z). 

In practice, it is undesirable to enumerate all possible choices of Z and 
E when constructing M (considering that if there are k in-scope data terms, 
a recursive predicate of arity n requires enumerating fc" choices for E). A 
reasonable heuristic is to let U be the strongest pure formula implied by S', 
and enumerate only those combinations of Z and E such that there is some 
n' ■. S' & cases{Z{R,x)) such that lT\E/x\ A 11 A x ^ y is unsatisfiable. For 
example, for assume(x ^ y), this heuristic means that we enumerate only {x,y) 
and {y, x) (i.e, we attempt to introduce a list segment from x to y and from y 
to x). 

We conclude this section with a theorem stating the correctness of our spatial 
interpolation procedure. 

Theorem 1. Let S and I' he Sep formulas and let c be a command such that 
exec(c, S) h P. Then itp{S, c, I') is a spatial interpolant for S, c, and I'. 

5 Spatial Interpolation Modulo Theories 

We now consider the problem of refining (or strengthening) a given separation 
logic proof of memory safety with information about (non-spatial) data. This 
refinement procedure results in a proof of a conclusion stronger than can be 
proved by reasoning about the heap alone. In view of our example from Fig. 
this section addresses how to derive the third sequence (Spatial Interpolants 
Modulo Theories) from the second (Spatial Interpolants). 

The input to our spatial interpolation modulo theories procedure is a path 
TT, a separation logic (Sep) proof C of the triple {true : emp} tt {true : true} (i.e.. 


- Entailment rules - 

Star, 

Co ► n A<P : Eo\- n' A<P' : E'o Ci ► U A <P : Ei \- H' A <P' : 
Co', Cl ► n A ^ : So * Si h U' A : Sq * S'l 


POINTS-TO 

n 1 = n' 

► n AP : E ^ [A,F]\- n' AP' : E ^ [A,F] 


Fold 

C ► n : s\- n' : S' * P[f/R, E/x] 
C ► n : s\- n' : s' *Z(f,E) 


P € cases{Z{R,x)) 


Unfold 

Cl ► 77 : r * Pi[f/,R, E/x] n' S' 

C„ ► n : S*P„[f/R,E/x\\- n' : S' {Pi,...,P„} = 
► n : S* Z{r,E)'r n' S' cases{Z{R,x)) 


Predicate 

n \= U Where = (XSi.Pi) 

P' ^ P'P[ ^ Pi A P', !7|V| ^ A <7 ► and t' = {Xai.P') 
RAP: Z{f, P) h 77' A <7' : Z{t', E) 


Execution rules 


Data-Assume 

C ► P A (p Q 

C ► {P} assuiTie((p) {<5} 


Free 

C ► P \- n AP S ^ x yA [A,E\ 
C ► {P} f ree(x) {U AP S} 


Sequence 

Co ► {P} TTO {6} Cl ► {O} TTl {Q} 


Co',Cl ► {P} 7ro;7ri {Q} 


Data-Load 

Co ► Ph (3X. PA^ : [A,P]) 

Cl ► (3X, a'. P[a'/a] A P[a'/a] A a = Ai[a'/a]: {S * x i-A [A, E])[a'/a]) h Q 
Co;Ci ► {P} a := x->Di {<5} 


Data-Assign 

C ► (3a'. U A P[a'/a] A a = A[a'/a] : S[a'/a] h Q) 
C ► {n AP'. S}a '.= k {Q} 


Data-Store 

Co ► P\-{3X. n AP : S*x^[A,E]) 

Cl ► {3X,a'. n AP Aa = A: S*x^ [A[a'/A,],E])\-Q 
Co;Ci ► {P} x->Di :=A{Q} 


Alloc 

C ► (3a;', a, x. n[x'/x] A P : S[x'/x] * x [a, a;]) h Q 
C ► {n A P: S} X := nevi(n,m) {Q} 


Fig. 6. Constraint generation. 



Refined memory safety proof Constraint system C 
{-Ro(i) : true} Raii') true 

i = nondetO: x = null Ri{i') ^ Ro{i) 

{-Ri(i) : ls{(Aa.i?i5i(i', i)), x, null) * true} R2(i') ^ Ri{i) /\ i 0 /\ i' = i + 1 

assume(i != 0); i--; i?3(t) t—i?2{t) A i = 0 

{R2(i) : ls((Aa.-Ris2(ir, i)), x, null) * true} i?4(i, d') <— Rsii) A Ris3(d', i) 

assume(i == 0) R\s2{^, t!) ^ Ri{i) A i?isi(rr, z)A'r/0Ar' = i+ l 

{Rsii) : ls((Aa.i?i53(r', i)), x, null) * true} -Ris2(!^, i') ■<— Ri{i) A v = i A i ^ 0 A i' = i + 1 

assume(x != null) _Ris3(rr, z) I—i?2(i) A-Ris2{rr, i) A r = 0 

{{ 3 d',y. R,i{i,d') : r i-> [d',y] =rtrue)} d' ^ 0 -r— Ri{i,d') 


Solution a 

Rei{i) : true 
Ri{i) : true 
R 2 {i) : true 
Reii) : true 
Ri{i, d') : d' > 0 

Rlsi(rr, i) ■. 12 ^ i 
R\s2(v, i) ■. u ^ i 
Ris3{v,i) : 1 / > 0 


Fig. 7. Example constraints. 


a memory safety proof for tt), and a postcondition ip. The goal is to transform 
C into an RSep proof of the triple {true : emp} tt {tp : true}. The high-level 
operation of our procedure is as follows. First, we traverse the memory safety 
proof C and build (1) a corresponding refined proof C' where refinements may 
contain second-order variables, and (2) a constraint system C which encodes 
logical dependencies between the second-order variables. We then attempt to 
find a solution to C, which is an assignment of data formulas to the second-order 
variables such that all constraints are satisfied. If we are successful, we use the 
solution to instantiate the second-order variables in C', which yields a valid RSep 
proof of the triple {true : emp} tt {pr : true}. 

Horn Clauses The constraint system produced by our procedure is a recursion- 
free set of Horn clauses, which can be solved efficiently using existing first-order 
interpolation techniques (see [331 ^ detailed survey). Following [T7|, we define 

a query to be an application Q(a) of a second-order variable Q to a vector of 
(data) variables, and define an atom to be either a data formula ip G DFormula 
or a query Q{a). A Horn clause is of the form 

h i — bi A • '' A 

where each of h, 6i,..., is an atom. In our constraint generation rules, it 
will be convenient to use a more general form which can be translated to Horn 
clauses: we will allow constraints of the form 

hi A ■ ■ * A hjir ^— 6i A ■ • • A bj^ 

(shorthand for the set of Horn clauses {hi bi A ■■■ A bN}i^i^M) and we will 
allow queries to be of the form Q{A) (i.e., take arbitrary data terms as arguments 
rather than variables). If C and C are sets of constraints, we will use C\C' to 
denote their union. 

A solution to a system of Horn clauses C is a map a that assigns each second- 
order variable Q of arity k a DFormula Q'^ with free variables drawn from i7 = 
(j^i,..., Uk) such that for each clause 

h i — bi A ■ • • A b]\[ 


in C the implication 


yA.{h^ {3B.bf A--- Ab%)) 


holds, where A is the set of free variables in h and B is the set of variables free 
in some but not in h. In the above, for any data formula (p, pf is defined to be 
yi, and for any query Q(a), Q(d)'^ is defined to be ... ,ak/vk] (where 

k is the arity of Q). 

Constraint Generation Calculus We will present our algorithm for spatial 
interpolation modulo theories as a calculus whose inference rules mirror the 
ones of separation logic. The calculus makes use of the same syntax used in 
recursive predicate definitions in Sec. We use r to denote a refinement term 
and ’P to denote a refined formula. The calculus has two types of judgements. 
An entailment judgement is of the form 

C ► {3X. n A<P: X)\- (3X'. B' A <P' ■. X') 

where 77, 11' are equational pure assertions over heap terms, 77, 77' are refined 
spatial assertions, <7, <P' are refined formulas, and C is a recursion-free set of 
Horn clauses. Such an entailment judgement should be read as “for any solution 
a to the set of constraints C, (377. 77 A <7"' : 77°’) entails (377'. 77' A : 77'°),” 
where <7° is <P with all second order variables replaced by their data formula 
assignments in cr (and similarly for 77°). 

Similarly, an execution judgement is of the form 

C ► {(377. 77A<7 : 77)} TT {(377'. 77'A<7': 77')} 

where tt is a path and 77,77', 77, 77', <7, <7', 77, 77', and C are as above. Such an ex¬ 
ecution judgement should be read as “for any solution a to the set of constraints 

C, 

{(3X. 77 A 7>° : 77°)} tt {(377'. 77' A ^'° : 77'°)} 
is a valid triple.” 

Let TT be a path, let C be a separation logic proof of the triple {true : 
emp} TT {true : true} (i.e., a memory safety proof for tt), and let tp € DFormula be 
a postcondition. Given these inputs, our algorithm operates as follows. We use 
V to denote a vector of all data-typed program variables. The triple is rewritten 
with refinements by letting R and R' be fresh second-order variables of arity 
|{)| and conjoining R{v) and R'{v) to the pre and post. By recursing on at 
each step applying the appropriate rule from our calculus in Fig.[^ we derive a 
judgement 


__ 

C ► {true A R{v) ■ true} tt {true A R'(v) ■ true} 

and then compute a solution a to the constraint system 

C; R{v) t— true; p 77'(w) 

(if one exists). The algorithm then returns , the proof obtained by applying 
the substitution a to C ■ 



Intuitively, our algorithm operates by recursing on a separation logic proof, 
introducing refinements into formulas on the way down, and building a system 
of constraints on the way up. Each inference rule in the calculus encodes both 
the downwards and upwards step of this algorithm. For example, consider the 
Fold rule of our calculus: we will illustrate the intended reading of this rule 
with a concrete example. Suppose that the input to the algorithm is a derivation 
of the following form: 

_Co_ 

X !->■ [a, null] h (36, y. x [6, y] * ls(j/, null)) 

-Fold 

Q{i) : X 1 -^ [a, null] h R(i) : \s{{\a.S{x, a)),x, null) 

(i.e., a derivation where the last inference rule is an application of Fold, and 
the conclusion has already been rewritten with refinements). We introduce re¬ 
finements in the premise and recurse on the following derivation: 

_Co_ 

Q{i) : ® !->■ [a, null] h (36, y. R{i) A S{i,b) x [6, y] * ls((Aa.S(a:, a)),y, null)) 


The result of this recursive call is a refined derivation Co as well as a constraint 
system C. We then return both (1) the refined derivation obtained by catenating 
the conclusion of the Fold rule onto Co and (2) the constraint system C. 

A crucial point of our algorithm is hidden inside the hat notation in Fig. 
(e.g, O in Sequence): this notation is used to denote the introduction of fresh 
second-order variables. For many of the inference rules (such as Fold), the re¬ 
finements which appear in the premises follow fairly directly from the refinements 
which appear in the conclusion. However, in some rules entirely new formulas 
appear in the premises which do not appear in the conclusion (e.g., in the Se¬ 
quence rule in Fig.[^ the intermediate assertion O is an arbitrary formula which 
has no obvious relationship to the precondition P or the postcondition Q). We 
refine such formula O by introducing a fresh second-order variable for the pure 
assertion and for each refinement term that appears in O. The following offers 
a concrete example. 

Example 3. Consider the trace tt in Fig. Suppose that we are given a memory 
safety proof for tt which ends in an application of the Sequence rule: 


{true : emp} tto {true : ls(a;, null)} 

{true : Is)*, null)} tti {(36, y. true ■. x [b, j/])} 
{Q{i) : emp} ttq; tti {(36, y. R{i, b) : x (6, i/])} 


Sequence 


where tt is decomposed as 7ro;7ri, ttq is the path from 1 to 3, and tti is the path 
from 3 to 4. Let O = true : ls(a::, null) denote the intermediate assertion which ap¬ 
pears in this proof. To derive O, we introduce two fresh second order variables, S 
(with arity 1) and T (with arity 2), and define O = S'(i) : ls((Aa.T(i, a)), cc, null). 
The resulting inference is as follows: 

{Q{i) : emp} tto {S{i) : \s{{Xa.T{i, a)),x, null)} 

{S(i) : ls((Aa.r(i, a)),x, null)} tti {(36, y. R{i, b) ■. x ]6, i/])} 

{Q{i) : emp} tto; tti {(36, y. R{i, b) : x [6, y])} j 








The following example provides a simple demonstration of our constraint 
generation procedure: 

Example 4- Recall the example in Fig.|^of Sec.[^ The row of spatial interpolants 
in Fig.|^is a memory safety proof C of the program path. Fig.j^shows the refined 
proof which is the proof C with second-order variables that act as placeholders 
for data formulas. For the sake of illustration, we have simplified the 
constraints by skipping a number of intermediate annotations in the 
Hoare-style proof. 

The constraint system C specifies the logical dependencies between the in¬ 
troduced second-order variables in For instance, the relation between i?2 
and i?3 is specified by the Horn clause i?3(f) ^ i?2(*) A f = 0, which takes 
into account the constraint imposed by assume (i == 0) in the path. The Horn 
clause d' > 0 ^ Ri{i,d') specifies the postcondition defined by the assertion 
assert (x->D >= 0), which states that the value of the data field of the node x 
should be ^ 0. 

Replacing second-order variables in Q' with their respective solutions in a 
produces a proof that the assertion at the end of the path holds (last row of 

Fig.§. 

Soundness and Completeness The key result regarding the constraint sys¬ 
tems produced by these judgements is that any solution to the constraints yields 
a valid refined proof. The formalization of the result is the following theorem. 

Theorem 2 (Soundness). Suppose that tt is a path, ( is a derivation of the 
judgement C ► {P} tt {Q}, and that a is a solution to C. Then , the proof 
obtained by applying the substitution a to C,, is a (refined) separation logic proof 
of{P'^}7:{Q-}. 

Another crucial result for our counterexample generation strategy is a kind of 
completeness theorem, which effectively states that the strongest memory safety 
proof always admits a refinement. 

Theorem 3 (Completeness). Suppose that tt is a memory-feasible path and 
C, is a derivation of the judgement C ► {Ro{v) : emp} tt {Ri{v) : true} obtained 
by symbolic execution. If cp is a data formula such that {true : emp} tt {p : true} 
holds, then there is a solution a to C such that Rfiv) => ip. 

6 Bounded Abduction 

In this section, we discuss our algorithm for bounded abduction. Given a bounded 
abduction problem 

Lh (3A. M* []) hi? 

we would like to find a formula A such that L h {3X. M * A) h R. Our algorithm 
is sound but not complete: it is possible that there exists a solution to the 
bounded abduction problem, but our procedure cannot find it. In fact, there 
is in general no complete procedure for bounded abduction, as a consequence 
of the fact that we do not pre-suppose that our proof system for entailment is 
complete, or even that entailment is decidable. 


Empty 

77 ^ 77' 


Star 

n : So h n' s'g n : Si h n' ■. s[ 


77 : [emp]" h 77' : ([emp]" < emp) 


n : So * Si h n': S'o * s[ 


POINTS-TO 

77 1= 77' 


True 

77 1= 77' 


n -.[E ^ [a, F]f h 77' : ([B >->■ [a, F]f < E ^ [a, 7’]) 


n S\- n' ■. ([true]" <true) 


Substitution 

77[7/a:] : SlE/x] h 77'[7/a;] : S'lE/x] n x = E 


3-right 

P h Q[M/x] 
p H (ax. Q) 


n s n' ■. s' 


Fig. 8. Coloured strengthening. All primed variables are chosen fresh. 


High level description Our algorithm proceeds in three steps: 

1. Find a colouring of L. This is an assignment of a colour, either red or blue, to 
each heaplet appearing in L. Intuitively, red heaplets are used to satisfy M, 
and blue heaplets are left over. This colouring can be computed by recursion 
on a proof of L h {3X. M * true). 

2. Find a coloured strengthening U : [A'l'Y * of R- (We use the notation 
[A’]"' or [A']*’ to denote a spatial formula S of red or blue colour, respectively.) 
Intuitively, this is a formula that (1) entails R and (2) is coloured in such a 
way that the red heaplets correspond to the red heaplets of L, and the blue 
heaplets correspond to the blue heaplets of L. This coloured strengthening 
can be computed by recursion on a proof of L h i? using the colouring of L 
computed in step 1. 

3. Check 11' : M * A \= R, where U' is the strongest pure formula implied 
by L. This step is necessary because M may be weaker than M' . If the 
entailment check fails, then our algorithm fails to compute a solution to the 
bounded abduction problem. If the entailment check succeeds, then 77" : A 
is a solution, where 77" is the set of all equalities and disequalities in 77' 
which were actually used in the proof of the entailment 11' : M * A \= R 
(roughly, all those equalities and disequalities which appear in the leaves of 
the proof tree, plus the equalities that were used in some instance of the 
Substitution rule). 

First, we give an example to illustrate these high-level steps: 

Example 5. Suppose we want to solve the following bounded abduction problem: 


L h ls(a:, y) * [ ] h 77 


where L = [a,y] * y ^ [&, null] and R = {3z. x i-)- [a,z\ * ls(j/, null)). Our 

algorithm operates as follows: 

1. Colour L: [x i-A- [a, y]]'' * [y H- [6, null]]’'' 

2. Colour 77: {3z. [x [a,^]]"' * [ls(y, null)]^) 

3. Prove the entailment 


X 7 ^ nullAy Y nullAa: 7 ^ y : ls(a:, y) * ls(y, null) |= 77 








This proof succeeds, and uses the pure assertion x ^ y. 

Our algorithm computes x ^ y : ls(?/, null) as the solution to the bounded ab¬ 
duction problem. j 


We now elaborate our bounded abduction algorithm. We assume that L 
is quantifier free (without loss of generality, since quantified variables can be 
Skolemized) and saturated in the sense that for any pure formula 77', if L h 7T', 
where L = U : E, then 77 h 77'. 

Step 1 The first step of the algorithm is straightforward. If we suppose that 
there exists a solution, A, to the bounded abduction problem, then by definition 
we must that have 7 |= (377. M * A). Since (377. M * A) |= (377. M * true), 
we must also have 7 \= (377. M * true). We begin step 1 by computing a proof 
of 7 h (377. M * true). If we fail, then we abort the procedure and report that 
we cannot find a solution to the abduction problem. If we succeed, then we can 
colour the heaplets of 7 as follows: for each heaplet E i—[A,F] in 7, either 
7 I—>■ [A, 7] was used in an application of the Points-to axiom in the proof of 
7 h (377. M * true) or not. If yes, we colour 7 i—>■ \A, F] red; otherwise, we colour 
it blue. We denote a heaplet H coloured by a colour c by [7]°. 

Step 2 The second step is to find a coloured strengthening of R. Again, suppos¬ 
ing that there is some solution A to the bounded abduction problem, we must 
have 7 |= (3X. M * A) |= R, and therefore 7 |= R. We begin step 2 by comput¬ 
ing a proof of 7 h 7. If we fail, then we abort. If we succeed, then we define a 
coloured strengthening of R by recursion on the proof of 7 h 7. Intuitively, this 
algorithm operates by inducing a colouring on points-to predicates in the leaves 
of the proof tree from the colouring of 7 (via the Points-to rule in Fig. and 
then only folding recursive predicates when all the folded heaplets have the same 
colour. 

More formally, for each formula P appearing as the consequent of some se¬ 
quent in a proof tree, our algorithm produces a mapping from heaplets in P 
to coloured spatial formulas. The mapping is represented using the notation 
{E < H), which denotes that the heaplet H is mapped to the coloured spatial 
formula E. For each recursive predicate Z and each (3A. 11 : Hi *■■■ * H^) € 
cases{Z{R,x)), we define two versions of the fold rule, corresponding to when 
Hi,. .., Hn are coloured homogeneously (FoldI) and heterogeneously (Fold2): 


FoldI 

{n : E n' : s' * ([//i]'' <Hi) *■■■* {[Hr,]" < Hr,))[E/S] 
n ■. E h n' e' * {[z{E)]" < z(E)) 


Fold2 

{H : E h n' ■. E' * {E',^ < Hi) *■■■ * {E'^ < Hrr))[E/x] 
H ■. E H' E' * {E'l *■■■» E'^ < Z{E)) 




The remaining rules for our algorithm are presented formally in Fig. s]^ To 
illustrate how this algorithm works, consider the FoldI and Fold2 rules. If a 
given (sub-)proof finishes with an instance of Fold that folds Hi* ■ ■ ■ * into 
Z{E), we begin by colouring the sub-proof of 

n Sh H' : S' * Hi*-- - * Hn 


This colouring process produces a coloured heaplet Si for each Hi. If there is 
some colour c such that each S'i is [HiY^ then we apply FoldI and Z{E) gets 
mapped to \Z{E)Y. Otherwise (if there is some i such that Si is not Hi or there 
is some i,j such that Si and Sj have different colours), we apply Fold2, and 
map Z{E) to ifi *• • • * Sn- 

After colouring a proof, we define A to be the blue part of R. That is, if the 
colouring process ends with a judgement of 

H : [Ai]^ * h H' : * [ 1 : 12 ]*’ < Hi) * - - - * * [S„2]'^ < H„) 

(where for any coloured spatial formula S, its partition into red and blue heaplets 

is denoted by [T’l]'' * [£’ 2 ]*^), we define A to be H' : S 12 *• • • * Sn 2 - This choice 

is justified by the following lemma: 

Lemma 1. Suppose that 

H : [Ai]^ * [A2]^ h H’ : ([Aji]^ * [A 12 ]*’ < Hi) * - - - * (KJ^ * < -ffn) 

is derivable using the rules of Fig. and that the antecedent is saturated. Then 
the following hold: 

— H' : Sii * S 12 *• • • * Sn2 \= H' : Hi *• • • * H^i 

— H : Si \= H' : Sn *• • • * S^ii and 

— H : S 2 \= H' : S 12 * • • • * Sn2 - 

Step 3 The third step of our algorithm is to check the entailment H : M*A ^ R. 
To illustrate why this is necessary, consider the following example: 

Example 6. Suppose we want to solve the following bounded abduction problem: 

X ^ y : X ^ [a,y\'r ls(a:, y) * W'r x ^ [a,y\ . 

In Step 1, we compute the colouring x ^ y : [a; !->■ [a,y]]'' * [emp]'’ of the left 
hand side. In step 2, we compute the colouring [x !->■ [a, y]]'' * [emp]*’ of the right 
hand side. However, emp is not a solution to the bounded abduction problem. 
In fact, there is no solution to the bounded abduction problem. Intuitively, this 
is because M is too weak to entail the red part of the right hand side. j 

7 Implementation and Evaluation 

Our primary goal is to study the feasibility of our proposed algorithm. To that 
end, we implemented an instantiation of our generic algorithm with the linked 
list recursive predicate Is (as defined in Sec. and refinements in the theory 
of linear arithmetic (QF_LRA). The following describes our implementation and 
evaluation of SplInter in detail. 

^ Note that some of the inference rules are missing. This is because these rules are 
inapplicable (in the case of Unfold and Inconsistent) or unnecessary (in the case 
of null-not-Lval and ^-Partial), given our assumptions on the antecedent. 




Implementation We implemented SplInter in the T2 safety and termi¬ 
nation verifier m- Specifically, we extended T2’s front-end to handle heap- 
manipulating programs, and used its safety checking component (which im¬ 
plements McMillan’s Impact algorithm) as a basis for our implementation of 
SplInter. To enable reasoning in separation logic, we implemented an entail- 
ment checker for RSep along with a bounded abduction procedure. 

We implemented a constraint-based solver using the linear rational arith¬ 
metic interpolation techniques of Rybalchenko and Stokkermans |34j to solve 
the non-recursive Horn clauses generated by SplInter. Although many off-the- 
shelf tools for interpolation exist (e.g., [26) 1 we implemented our own solver for 
experimentation and evaluation purposes to allow us more flexibility in control¬ 
ling the forms of interpolants we are looking for. We expect that SplInter 
would perform even better using these highly tuned interpolation engines. 

Our main goal is to evaluate the feasibility of our proposed extension of 
interpolation-based verification to heap and data reasoning, and not necessar¬ 
ily to demonstrate performance improvements against other tools. Nonetheless, 
we note that there are two tools that target similar programs: (1) Thor [22], 
which computes a memory safety proof and uses off-the-shelf numerical verifiers 
to strengthen it, and (2) XiSA [T2], which combines shape and data abstract do¬ 
mains in an abstract interpretation framework. Thor cannot compute arbitrary 
refinements of recursive predicates (like the ones demonstrated here and required 
in our benchmarks) unless they are manually supplied with the required theory 
predicates. Instantiated with the right abstract data domains, XiSA can in prin¬ 
ciple handle most programs we target in our evaluation. (XiSA was unavailable 
for comparison m-) Sec. [^provides a detailed comparison with related work. 

Benchmarks To evaluate SplInter, we used a number of linked list bench¬ 
marks that require heap and data reasoning. First, we used a number of simple 
benchmarks: listdata is similar to Fig. where a linked list is constructed and 
its data elements are later checked; twolists requires an invariant comparing 
data elements of two lists (all elements in list A are greater than those in list 
B); ptloop tests our spatial interpolation technique, where the head of the list 
must not be folded in order to ensure its data element is accessible; and refCount 
is a reference counting program, where our goal is to prove memory safety (no 
double free). For our second set of benchmarks, we used a cut-down version of 
BinChunker (http://he.fi/bchunk/), a Linux utility for converting between dif¬ 
ferent audio CD formats. BinChunker maintains linked lists and uses their data 
elements for traversing an array. Our property of interest is thus ensuring that 
all array accesses are within bounds. To test our approach, we used a number 
of modifications of BinChunker, bchunk.a to bchunk.f, where a is the simplest 
benchmark and f is the most complex one. 

Heuristics We employed a number of heuristics to improve our implementation. 
First, given a program path to prove correct, we attempt to find a similar proof 
to previously proven paths that traverse the same control flow locations. This 
is similar to the forced covering heuristic of |25j to force path interpolants to 
generalize to inductive invariants. Second, our Horn clause solver uses Farkas’ 


Benchmark 

#ProvePath 

Time (s) 

T Time 

Sp. Time 

listdata 

5 

1.37 

0.45 

0.2 

twolists 

5 

3.12 

2.06 

0.27 

ptloop 

3 

1.03 

0.28 

0.15 

refCount 

14 

1.6 

0.59 

0.14 

bchunk_a 

6 

1.56 

0.51 

0.25 

bchunk_b 

18 

4.78 

1.7 

0.2 

bchunk_c 

69 

31.6 

14.3 

0.26 

bchunk_d 

23 

9.3 

4.42 

0.27 

bchunk.e 

52 

30.1 

12.2 

0.25 

bchunk.f 

57 

22.4 

12.0 

0.25 


Table 1. Results of running SplInter on our benchmark set. 


lemma to compute linear arithmetic interpolants. We found that minimizing the 
number of non-zero Farkas coefficients results in more generalizable refinements. 
A similar heuristic is employed by [1] . 

Results Tablel^shows the results of running SplInter on our benchmark suite. 
Each row shows the number of calls to ProvePath (number of paths proved), the 
total time taken by SplInter in seconds, the time taken to generate Horn clauses 
and compute theory interpolants (T Time), and the time taken to compute spa¬ 
tial interpolants (Sp. Time). SplInter proves all benchmarks correct w.r.t. their 
respective properties. As expected, on simpler examples, the number of paths 
sampled by SplInter is relatively small (3 to 14). In the bchunkA examples, 
SplInter examines up to 69 paths (bchunk_c). It is important to note that, in 
all benchmarks, almost half of the total time is spent in theory interpolation. We 
expect this can be drastically cut with the use of a more efficient interpolation 
engine. The time taken by spatial interpolation is very small in comparison, and 
becomes negligible in larger examples. The rest of the time is spent in checking 
entailment of RSep formulas and other miscellaneous operations. 

Our results highlight the utility of our proposed approach. Using our proto¬ 
type implementation of SplInter, we were able to verify a set of realistic pro¬ 
grams that require non-trivial combinations of heap and data reasoning. We ex¬ 
pect the performance of our prototype implementation of SplInter can greatly 
improve with the help of state-of-the-art Horn clause solvers, and more efhcient 
entailment checkers for separation logic. 

8 Related Work 

Abstraction Refinement for the Heap To the best of our knowledge, the 
work of Botincan et al. [7] is the only separation logic shape analysis that em¬ 
ploys a form of abstraction refinement. It starts with a family of separation logic 
domains of increasing precision, and uses spurious counterexample traces (re¬ 
ported by forward fixed-point computation) to pick a more precise domain to 
restart the analysis and (possibly) eliminate the counterexample. Limitations of 
this technique include: (1) The precision of the analysis is contingent on the set 
of abstract domains it is started with. (2) The refinement strategy (in contrast 
to SplInter) does not guarantee progress (it may explore the same path re- 









peatedly), and may report false positives. On the other hand, given a program 
path, SplInter is guaranteed to find a proof for the path or correctly declare it 
an unsafe execution. (3) Finally, it is unclear whether refinement with a powerful 
theory like linear arithmetic can be encoded in such a framework, e.g., as a set 
of domains with increasingly more arithmetic predicates. 

Podelski and Wies [SD] propose an abstraction refinement algorithm for a 
shape-analysis domain with a logic-based view of three-valued shape analysis 
(specihcally, hrst-order logic plus transitive closure). Spurious counterexamples 
are used to either refine the set of predicates used in the analysis, or refine 
an imprecise abstract transformer. The approach is used to verify specifications 
given by the user as first-order logic formulas. A limitation of the approach 
is that refinement is syntactic, and if an important recursive predicate (e.g., 
there is a list from x to null) is not explicitly supplied in the specification, it 
cannot be inferred automatically. Furthermore, abstract post computation can 
be expensive, as the abstract domain uses quantified predicates. Additionally, 
the analysis assumes a memory safe program to start, whereas, in SplInter, we 
construct a memory safety proof as part of the invariant, enabling us to detect 
unsafe memory operations that lead to undefined program behavior. 

Beyer et al. [S] propose using shape analysis information on demand to aug¬ 
ment numerical predicate abstraction. They use shape analysis as a backup anal¬ 
ysis when failing to prove a given path safe without tracking the heap, and in¬ 
crementally refines TVLA’s |6] three-valued shape analysis [35] to track more 
heap information as required. As with m, 0 makes an a priori assumption of 
memory safety and requires an expensive abstract post operator. 

Finally, Manevich et al. [23] give a theoretical treatment of counterexample- 
driven refinement in power set (e.g., shape) abstract domains. 

Combined Shape and Data Analyses The work of Magill et al. [22] infers 
shape and numerical invariants, and is the most closely related to ours. First, a 
separation logic analysis is used to construct a memory safety proof of the whole 
program. This proof is then instrumented by adding additional user-defined in¬ 
teger parameters to the recursive predicates appearing in the proof (with corre¬ 
sponding user-dehned interpretations). A numerical program is generated from 
this instrumented proof and checked using an off-the-shelf verihcation tool, which 
need not reason about the heap. Our technique and |22j’s are similar in that we 
both decorate separation logic proofs with additional information: in [22] . the 
extra information is instrumentation variables; in this paper, the extra infor¬ 
mation is refinement predicates. Neither of these techniques properly subsumes 
the other, and we believe that they may be profitably combined. An important 
difference is that we synthesize data refinements automatically from program 
paths, whereas [55] uses a fixed (though user-definable) abstraction. 

A number of papers have proposed abstract domains for shape and data 
invariants. Chang and Rival [T5] propose a separation logic-based abstract do¬ 
main that is parameterized by programmer-supplied invariant checkers (recur¬ 
sive predicates) and a data domain for reasoning about contents of these struc¬ 
tures. McCloskey et al. [24] also proposed a combination of heap and numeric 


abstract domains, this time using 3-valued structures for the heap. While the 
approaches to combining shape and data information are significantly different, 
an advantage of our method is that it does not lose precision due to limitations 
in the abstract domain, widening, and join. 

Bouajjani et al. propose an abstract domain for list manipulating pro¬ 
grams that is parameterized by a data domain. They show that by varying the 
data domain, one can infer invariants about list sizes, sum of elements, etc. Quan¬ 
tified data automata (QDA) [TB] have been proposed as an abstract domain for 
representing list invariants where the data in a list is described by a regular 
language. In m, invariants over QDA have been synthesized using language 
learning techniques from concrete program executions. Expressive logics have 
also been proposed for reasoning about heap and data EH, but have thus far 
been only used for invariant checking, not invariant synthesis. A number of deci¬ 
sion procedures for combinations of the singly-linked-list fragment of separation 
logic with SMT theories have recently been proposed [JHl ES] ■ 

Path-based Verification A number of works proposed path-based algorithms 
for verification. Our work builds on McMillan’s Impact technique [53] and ex¬ 
tends it to heap/data reasoning. Earlier work |19j used interpolants to compute 
predicates from spurious paths in a CEGAR loop. Beyer et al. H] proposed path 
invariants, where infeasible paths induce program slices that are proved correct, 
and from which predicates are mined for full program verification. Heizmann 
et al. [TB] presented a technique that uses interpolants to compute path proofs 
and generalize a path into a visibly push-down language of correct paths. In 
comparison with SplInter, all of these techniques are restricted to first-order 
invariants. 

Our work is similar to that of Itzhaky et al. [H], in the sense that we both 
generalize from bounded unrollings of the program to compute ingredients of a 
proof. However, they compute proofs in a fragment of hrst-order logic that can 
only express linked lists and has not yet been extended to combined heap and 
data properties. 
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1: function SplInter(7^) 

2 : ^ ^ 0 
3: loop 

4: TT ^ lsProof(5') 

5: if TT IS empty then // S is a proof 

6: return found proof S 

7: else // TT = Vi,... ,Ve inV does not appear in S 

8: K ^ ProvePath(7r) 

9: if K IS empty then // No proof computed for tt 

10: return found erroneous execution tt 

11: S" ^ 0 

12: for each k' G S' do 

13: {k, k') ^ Conj(K, k') 

14: S'^S'U{k'} 

15: S^S'U{k} 

Fig. 9. Main Algorithm of SplInter. 

A From Proofs of Paths to Proofs of Programs 

In this section, we describe how SplInter constructs a proof of correctness 
(i.e., unreachability of the error location Ve) of a program V = {V,E,Vi,Ve) 
from proofs of individual paths. We note that our algorithm is an extension of 
Impact [2S] to RSep proofs; we refer the reader to [5S] for optimizations and 
heuristics. The main difference is the procedure ProvePath, which constructs an 
RSep proof for a given program path. 

The Main Algorithm Figure shows the main algorithm of SplInter. El¬ 
ements of the set S are program paths from Vi to Vg, annotated with RSep 
formulas. For example, 


(ai,Ti), (02,^2), ■ • ■ , { an , Vn ) 

is an annotated path where (1) {aj}j are RSep formulas; (2) vi = v, and 
Vn = Vg; (3) for j G [l,n - 1], {vj,Vj+i) G E; (4) for each edge e = {vj,Vj+i), 
{oj} {oj+i} is valid; and (5) a„ is false (since we are interested in proving 
unreachability of Vg ). 

SplInter uses the procedure IsProof to check if the set of annotated paths S 
represents a proof of correctness of the whole program. If not, IsProof samples a 
new program path (tt) that does not appear in S. Using the procedure ProvePath, 
it tries to construct an annotation/proof (using spatial(T) interpolants) of tt. If 
no proof is constructed, SplInter concludes that tt is a feasible execution to 
Vg (or it performs unsafe memory operations). Otherwise, it uses the procedure 
Conj to strengthen the proofs of all program paths in S, adds the annotated path 
to S, and restarts the loop. 



1: function ProvePath(7r = vi,..., Vn) 

II Check if path is feasible (can reach error location Ve). 
2: let k be largest integer s.t. exec(r;i,..., Vk) is defined. 

3: symbHeap ^ exec(z;i,..., Vk) 

4: II If -K is memory-infeasible, 

5: // only compute spatial interpolants, 

6: //as theory interpolants are not needed 

7: if Sk 1= false, where symbHeap = ■. ■, (S'fc, r'fc) then 

8: let fc' ^ fc be the largest integer s.t. Sk' ^ false 

9: spint Spatial((tii ,... ,Vk'), Sk') 

10: return spint, {false, Vk>+i), ■ ■ ■, {false, Vn) 

11: refSymbHeap-(r- ReI\ne{symbHeap, false) 

12: if fc = 0 or ref SymbHeap is undefined then 

13: return no proof found 

14: 

// Path infeasible - construct proof. 

15: spint ^ Spatial((z;i,..., Vk), true : true) 

16: spTheoryInt -^r- Rei\ne{splnt, false) 

17: if spTheoryInt is undefined then 

18: return refSymbHeap, {false, Vk+i), ■ ■ ■, {false, u„) 

19: return spTheoryInt, {false, Ufc+i),..., {false, Vn) 

Fig. 10. Pseudocode for ProvePath. 


Note that the annotated paths in S represent an Abstract Reachability Tree 
(ART), as used in |25] as well as other software model checking algorithms. The 
tree in this case is rooted at v-,, and branching represents when two paths diverge. 

We will now describe SplInter’s components in more detail. 

IsProof: Checking Proofs Given a set of annotated paths S, we use the pro¬ 
cedure lsProof(5') to check if the annotations represent a proof of the whole 
program. Specifically, for each v G V, we use I{v) to denote the formula VI®! I 
{aj,v) G K,K G S'}. In other words, for each location v in V, we hypothesize an 
inductive invariant I{v) from our current annotations of paths passing through 
V . We can then check if our hypotheses indeed form an inductive invariant of the 
program. If so, the program is safe: since I{ve) is always false (by definition), 
the fact that I is an invariant implies that the post-state of any execution which 
reaches must satisfy false, and therefore the error location Vg is unreachable. 
Otherwise, IsProof returns a new program path on which our hypothesized in¬ 
variant does not hold, which we use to refine our hypotheses. In practice, one 
can perform IsProof implicitly and lazily by maintaining a covering (entailment) 
relation [25] over the nodes of the ART. 




ProvePath: Constructing a Proof of a Path Figure [T0| shows an algorithm 
for computing a proof of a path tt. First, in lines [2p^ we check if tt is feasible 
(i.e., whether tt corresponds to a real program execution). We do this by com¬ 
puting the strongest postconditions along tt (using exec) and then attempting to 
strengthen the annotation (using Refine) with theory interpolants to prove that 
false is a postcondition of tt. If no such strengthening is found, we know that tt 
is feasible (by Theorem]^. Note that if tt is memory-infeasible (lines [7p^, then 
we only compute spatial interpolants along the memory-feasible prefix of the 
path and return the result. This is because when the path is memory-infeasible, 
we do not need data rehnements along the path to prove it cannot reach Ve- 


The function Spatial((t!i,..., r!„), P) takes a program path tt = vi,...,Vn 
and a Sep formula P and returns the path annotated with spatial interpolants 
w.r.t the postcondition P. The function Refine(K, (p) takes an annotated program 
path K with Sep formulas (from spatial interpolants) and a (/? G DFormula and 
returns a refined annotation of k that proves the postcondition tp (using theory 
interpolants). 


If the path tt is infeasible, we proceed by constructing spatial(T) interpolants 
for it (lines [TSp^ . We use the function Spatial (Section]^ to construct spatial 
interpolants, which we then refine with theory interpolants using the function 
Refine (Section]^. Spatial path interpolants are computed with respect to the 
postcondition true : true, indicating that we are looking for a memory safety 
proof. Note that we might not be able to find theory interpolants if the spa¬ 
tial interpolants computed are too weak and hide important data elements (in 
which case, on line we use the result of exec as the spatial interpolants - 


the strongest possible spatial interpolants). To illustrate how this could happen, 
consider Figure [TTj a modification to our illustrative example from Figure]^ 


node* X = null; int i = 2; 
while (i > 0) 

node* Imp = malloc(node) ; 
tmp->N = x; tmp->D = i; 

X = Imp; i--; 
i = 1; 

P; while (x != null) 
if (isOdd(i)) 

assert(isOdd{x->D) ) 

X = x->N; i++; 


Fig. 11. Refinement Example. 

Here, a list of length 2 is constructed. The second loop checks that nodes 
at odd positions in the linked list have odd data elements. Suppose SplInter 
samples the path that goes through the first loop twice and enters the second 
loop arriving at the assertion. Our spatial interpolation procedure will introduce 
a list segment ls(a;, null) at location P. As a result, we cannot find a refinement 
that proves the assertion, since using the list segment predicate definition from 
Section we cannot specify that the first element is odd and the second is 








even. This is because refinements must apply to all elements of Is, and cannot 
mention specific positions. In this case, we use the symbolic heaps as our spatial 
interpolants. That is, we annotate location P with x i—>■ [di,e'] * e' i—>■ [d^jnull]. 
Consequently, theory interpolants are able to refine this by specifying that d'l is 
odd. 

Conj: Conjoining Proofs of Paths When a new annotated path k is com¬ 
puted, we strengthen the proofs of all annotated paths k' in S that share a 
prefix with k using an operation Conj(K, k'), defined in the following. (This 
is analogous to strengthening annotations of a path in an ART - all other 
paths sharing a prefix with the strengthened path also get strengthened.) Let 
K = (oi, ui),..., (o„, u„) and k' = ..., Let k be the largest 

integer such that for all j < k, Vj = Vj (i.e., k represents the longest shared 
prefix of k and k'). Conj returns a pair (k, k') consisting of the strengthened 
annotations: 


K ^ (oi A aj,ui),..., (ofc A afc,Ufc), {ak+i,Vk+i), ■■■, (an,u„) 
k' ^ (oi A aj, ui), ..., (ofc A 4, Ufc), (4+1, ..., {a'^, v'^) 

The issue here is that RSep is not closed under logical conjunction (A), since we 
do not allow logical conjunction of spatial conjunctions, e.g., (Is(a:, 2 /) * true) A 
(ls(j/, z) * true). In practice, we heuristically under-approximate the logical con¬ 
junction, using the strongest postcondition of the shared prefix to guide the 
under-approximation. Any under-approximation which over-approximates the 
strongest postcondition (including the strongest postcondition itself) is sound, 
but overly strong annotations may not generalize to a proof of the whole pro¬ 
gram. Note that the above transformation maintains the invariant that all paths 
in S are annotated with valid Hoare triples. 

B Proofs 

In this section, we present proof sketches for the theorems appearing in this 
paper. 

B.l Proof of Theorem 

Theorem 4. Let S and I' he RSep formulas and let c he a command such that 
exec{c,S) |=Then 

(I) S h itp{S,c,I') 

(II) {itp{s,c,r)}c{r} 

Proof. We prove one case to give intuition on why the theorem holds. 

Suppose c is an allocation statement x := new(n,m). 

Recall that we defined 


itp(S', c,/^) = {3x. A) 


where 


exec(c, S) h (30. (3a, z. x^ [a, z\) * [A]) h I' 


Define 

S' = exec(c, S) = S[x'/x] * x !->■ [a, ^ 

for x', a, z fresh. 

First we show (I). By the properties of bounded abduction, we have 
S' = S\x'/x\ * X I—?> [a,z\ \= A* ((3a, z . x i—>■ [a, i])) 
from which we can see that 


S[x'/x\ 1= A 1= (3x. A) 


and thus 

S h (3x. A) = itp(S',c,/') 

Next we show (II). We compute 

exec(c, (3x. A)) = (3x', a, z. ((3x. A))[x'/x] * x [a, z\) 
where x', a, z are fresh. 

= (3x', a, z. ((3x. A)) * X !->■ [a, z\) 

= ((3x. A)) * ((3a, z. X !->■ [a, z\)) 

Since S' is of the form S[x'/x] * x [a, z] and S" ^ A * true, the only place 
where x may appear in A is in a disequality with some other allocated variable. 
It follows that 


((3x. A)) * ((3a, z. X !-)■ [a, z])) = A* ((3a, z. x !->■ [a, z\)) 
By the properties of bounded abduction, we have 

A * ((3a, z. X I—)■ [a, z])) |= I 


and thus 
and finally 


exec(c, (3x. A)) \= I 
{itp(S',c,/')} c {/'} 


B.2 Proof of Theorem 

Theorem 5 (Soundness). Suppose that tt is a path and that is a proof of the 
judgement C ► {U \ S} tt {U' : S'}, and that a is a solution to C. Then , 
the proof obtained by applying the substitution a to (j, is a (refined) separation 
logic proof of 

{{n : r)-} TT {(77' : E'Y} . 


Proof. The proof proceeds by induction on (. We will give an illustrative example 
using an entailment judgement. 


Suppose that C is an entailment proof consisting of a single application of 
the Predicate rule: 

Predicate 

n 1 = n' 

<P' A $■■■■■, ^ A ^ ► 

77 A : Z{t, 77) h 77' a : Z{?, E) 

(where = {Xai-Ei) and r( = (XSi.ED). Suppose that ct is a solution to the 
constraint system 

C = <7' ^ ^ t7i A <7;- • •; !7|'-| ^ 'T'lyj A ^ 

Since cr is a solution to C, we have that 

and thus (noting that 77 ^ 77') 

n \= n' A and for all 7, A 77 A <7'" ^ 

It follows that , given below, is a valid derivation: 

Predicate 

77 A <7"'’ ^ 77' A A 77 A |= !7( • • • A 77 A |= 

n A <P’^ : Z(7^, ^) h 77' A <7"" : Z(ff, 77) 

B.3 Proof of Theorem 

Theorem 6 (Completeness). Suppose that tt is a memory-safe path and C is 
the proof of the judgement 

C ► {7?o(u) : emp} tt {Ri{v) : true} 

obtained by symbolic execution. If ip is a data formula such that {true : emp} tt {ip 
true} holds, then there is a solution a to C such that Rfiv) => ip. 

Proof. Consider that for each formula (377. 77 : 77) in a symbolic execution 
sequence, 77 is a *-conjunction of (finitely many) points-to predicates. The con¬ 
straints we generate in this situation are the same as the ones that would be 
generated for a program which does not access the heap (but which has addi¬ 
tional variables corresponding to data-typed heap fields). 

C Formalization of RSep and Sep 

In this section we present the full proof systems for RSep and Sep, as well as 
the full set of constraint generation rules which we described in Section The 
syntax and semantics of RSep formulas, in terms of stacks and heaps, are shown 
in Figures [T^ and [T^ 




Syntax 

Heap variables 
Data variables 

First-order terra that evaluates to value in D 
Data formulas 
Recursive predicates 

Xa.(p 
X I a 

null I X 

A I E 

true \ E = E\ E^E\ip |7TA7T 
true \ emp \ E 1 -^ [A, E] \ Z{9, E) 

H \ H *E 
{3X. n : E) 

Fig. 12. Syntax of RSep formulas. 

D Spatial interpolation for assumptions 

In the spatial interpolation rules for assume presented in Section we encoun¬ 
tered the following problem: we have an equality or disequality assertion 7T, a 
symbolic heap S, and a formula M such that S' A 77 h M, and we need to com¬ 
pute a formula M' such that S h M' and M' A 77 h M. Moreover, we wish M' 
to be as weak as possible (i.e., 717' should be “close to 717” rather than “close to 
S”). 

In this section, we will define a recursive procedure pitp(S, 77,7V7) which takes 
as input a symbolic heap S, an equality or disequality formula 77, and a Sep 
formula M such that S A 77 h 7\7 and computes a formula 717' = pitp(S, 77,717) 
such that S h 717' and 717' A 77 h M. We will assume that S = Es : Ss is 
saturated in the sense that for any assertion 77o, if S' h IIq then 775 h ^ 0 i and 
that S A 77 is satisfiable. 

We observe that if 717 has existentially quantified variables, they can be in¬ 
stantiated using the proof of S A 77 h 717. Thus we may assume that 717 is 
quantifier-free, and write 717 as 

717 = IIm • 77i * • • • * 77„ 

The proof of S A 77 h 717 also induces an n-colouring on S which colours each 
points-to predicate in S with the index i of the corresponding heaplet Hi (cf. 
step 1 of the bounded abduction procedure presented in Section]^. We may thus 
write S as follows: 

S = 77s : [ri]i 


x,y € HVar 
a, 6 € DVar 
A G DTerm 
ip G DFormula 
Z G RPred 
9 G Refinement ::= 
X C Var ::= 
E,E G HTerm ::= 
JE ::= 
77 G Pure ::= 
77 G Heaplet ::= 
E G Spatial ::= 
P G RSep ::= 


*• • • * [^n]” 




(such that for each i, Us A 11 : Ui\- 11 m - Hi). 

We will compute a pure formula such that TTg h and A 77 h 11 m 
and for each i we will compute a formula Pi such that Ug ■. Si \- Pi and 
Pi A 77 h Hi. We then take pitp(S', P, M) to be H'j^ : Pi *• • • * Pn. 

First, we show how to compute H'^. Note that note that since S is saturated, 
the fact that S A H \- M implies Hg A P h Hm- We will assume that Pm 
consists of a single equality or disequality: the procedure can be extended to 
arbitrary conjunction by applying it separately for each conjunct and conjoining 
the results. If Hg h Pm then we simply take H'j^ to be Hm- Otherwise, assume 
w, X, y, z are such that P is an equality or disequality w = x / w ^ x and Hm 
is an equality or disequality y = z j y ^ z. Since Hg A P h Hm and Hg 1/ Pm, 
there is some y' , z' G {ru, x} such that Hg \- y = y' Az = z' (to see why, consider 
each of the four cases for Hg and Pm)- We take Pjj^ to he y = y' A z = z'. 

Finally, we show how to compute Pi (for each i G [l,n.]). If Hg : Si \- Hi, 
then we simply take Pi to be Hi. Otherwise, suppose that Hi is Z{E) for some 
predicate Z and vector of heap terms E (the case that Hi is a points-to predicate 
is essentially a special case). First, we attempt to find a vector of heap terms E' 
such that Hg : Si h Z{E') and Hg A H \- Ej = P' for each j (noting that there 
are finitely many such E' to choose from). If we succeed, we may take Pi to be 
P' : Z{E'), where P' = pitp(5', P,/\^- Ej = Ej). If we fail, then since Hg A H : 
Si h Z{E), there is some Q G case.s{Z{R,x)) such that Hg A H : Si\- Q[E/x]. 
We may take Pi to be pitp(P 5 : Si,H,Q[E/x]). 



Semantic Domains 


Var = HVar + DVar 
Val = Loc + D 
Stack = Var —> Val 
Heap = Loc ^fin Rec 
Rec = (M ^fin D, IN ^fin Loc) 
State = Stack x Heap 


Satisfaction Semantics 


s,h\=E = F 
s,h^Ey^F 
s,h \= ip 
s, h \= III A n 2 
s, h 1= Z{f, E) 
s, h \= emp 


lEl{s) = lFl{s) 
m{s)^lF\{s) 

M(s) 

(s, h ^ III) and (s, h |= 772) 

3P G cases{Z{R,x)).s,h \= P\f!R, Ejx) 
dom(/i) = 0 


s,h\=E^[A,F\ dom(/i) = {|7?](s)} 
and /i(|7;](s)) = 

({* '-A I^il(s)|* € [1, |g1|]}, {i H> |Fj](s)|7 e [1, |-F|]}) 


s, 7 1= 77 * S' 

s,h\= n ■. s 
s, 7 h (377. 77 : 77) 


there exists 7o, hi s.t. 7o W 7i = 7 
and (s, ho ^ 77) and (s, hi ^ 77') 

(s, 7 ^ 77) and (s, 7 ^ 77) 

there exists s : X Val s.t. s 0 s, 7 (= 77 : 77 


Note that we model records (Rec) as two finite maps representing data 
fields and heap fields. 

We use 0 to denote union of functions with disjoint domains, and 0 to denote 
overriding union of functions. 


Fig. 13. Stack/heap semantics of RSep formulas. 



Entailment rules 


Empty 

n 1 = n' 

n : emp h U' : emp 


3-left 
P[x /x] h Q 

(3a;. P) h Q 


3-right 
P h Q[.E/a:] 

P h (3a;. Q) 


Predicate 

JI 1 = J7 (pi A il 1 = A 77 1 = </ 9 „ Where = {Xui.ipi 

n : Z(f, E) I- 77' : Z{t , E) and r' = (Aoi.^p') 


True 

77 ^ 77' 

77 : 17 h 77' : true 

Star 

77 : 77o h 77' : 77o 77 : I7i h 77' : Tj 

77 : 77o * 77i h 77' : 77 q * 77( 

Substitution 

77 [E/a;] : 77 [E/a;] h E'[E/a;] : E'[E/a;] E ]= a; = E 
E : E h E' : E' 


POINTS-TO 

E ]= E' 

E : E i-> [yl, E] h E' : E i-> [yl, E] 


null-not-Lval 

E A E / null : E * E i-> [A E] h E' : E' 
E : E * E i-> [T, E] h E' : E' 


*-Partial 

E A E / E : E * E [A, E] * E i-> [E, E] h E' : E' 
E : E * E [1, E] * E i-> ]E, E] h E' : E' 


Fold 

E : E h E' ; E' * E[r/E, E/f] 


E : E h E' : E' * E(t, E) 


E G cases{Z{R, x)) 


Unfold 

E : E * El [r/E, E/f] h E' : E' 
E:E*E„]f/7?,E/x]hE':E' 

E : E*E(f,E) h E': E' ^ 


cases{Z{R, x)) 


Fig. 14. RSep Proof System 



Execution rules 


Assign 

{n:E}x ■.= /l{{3x'. n[x'/x]Ax 


Assume 

{n : E} assume{JT^) {11 A U' : E} 


= M[x /x] : E[x'/x])} 
Sequence 

{P} TTO {O} {O} TTl {Q} 

{P} 7ro;7ri {Q} 


Data-Store 

77: r h JT' : r' *a; i-A [A, E] 

{77: r}x->Di := A{77': r'*a;i-A [A[^/A,],P]} 


Heap-Store 

n-. E\- n' : E' *x^ [A,E] 

(77: r} x->N, := E [U' : E'* x ^ [A,E[E/Ei]]} 


Data-Load 

n-. E\- n' : E' *x^ [A,E] 

{77: 77} y := x->Di {{3y'. n'[y'/y] Ay = Ai[y'/y] : {E' * x [A, E])[y'/y])} 

Free 

n-. E\- n' : E' *x^ [A,E] 

{77: E} free(x) {77' : E'} 


Heap-Load 

n-. E\- n' : E' *x^ [A,E] 

{77: r} y := x->N, {(3y'. 77'[y7y] Ay = Pi[y7y] : (L7' * a; >-)■ [A, P])[y7j/])} 
Consequence 

P'h P {P} c{Q} Q h Q' 

{P'} c {Q'} 

Alloc 

{P: 77} X := new{n,m) {(3a;', a, y. P[a;7a:] : E\x lx\ * a; i-A- [a,y\){ 


Exists 

{p} c {g} 

{(3a;. P)} c {Q} 


X ^ Var(Q) U Var(c) 


Fig. 15. RSep proof system. 



Predicate 

n 1 = n' 

n : Z{E) h n' : Z{E) 

Unfold 
n : E*Pi[E/c 

n : 

77 : U * Ei^E) 


Fold 

77 : r h 77' : r' * P[-E/f] 
77 : r h 77' : U' * 


P G cases{Z{R,x)) 


c] h 77' : r' 


cl h 77' : r' 

— {Pi,..., P„} = cases{Z{R, x)) 


\- n' : E' 


Fig. 16. Sep proof system. All other entailment and execution rules are as in Figure 





Entailment rules 


Empty 


n h n' 


True 


n h n' 


<1> ► JI A ^ : emp \- U' emp <!>' $ ► 11 A <!>: E \- U' A 4>' : true 


Inconsistent 

n \— false 

[] ► n A <P : E\- n' A E' 
Substitution 


3-left 

C ► P[x'/x] h Q 
C ► (3x. P)\-Q 


C ► B[E/x\ A E : r[P/a:] h B'[E/x\ A E' : E'[E/x\ 

P 1= ® = P 

C ► P A <2^ : rh 77' A : r' 


null-not-Lval 

C ► B AEaE^ null : P * P i-> [T, A 1- A <7' : r' 

3-right 

C ► PI-Q[iE/a:] 

C ► P A <7 : P * P i-> [4 .F] 1- A <7' : P' 

C ► P h (3x. Q) 


^-Partial 

C ► n A E ^ F A $ : E * E [A, E] * F [B, F] n' A : E' 

C ► n A <P : E * E [A, E] * F [B, F] n' A <P' : E' 

Star. 

Co ► n A<P : Eo\- n' A<P' : E'o Ci ► B A <P : Ei h B' A <P' : E[ 

Co’, Cl ► B A $ : Eo * El h B' A : Eq * E'l 


POINTS-TO 


B h B' 


<P' ► B A : E [A, F]\- B' A E [A, F] 

P e cases{Z{R, x)) 


Fold 

C ► JI : rh i7' : r' * P[t/R, E/x] 
C ► B : E\- B' : E' *Z(j,E) 


Unfold 

Cl ► i7 : r * Pi[f/i?, E/f] \- B' ’.E' 

Cn *■ B ’.E* P„[f/R, E/x\ ^ B' ’.E' 
Ci;...;C„ ► B : E* Z{T,E)'r B' : E' 


{Pi,... ,P„} = cases{Z{R,x)) 


Predicate 

B \— B Where ri = (XSi.Ei) 

<P' ^ El A ^ >f'|V| AE*- B A E: Z{t, E)\- B' A <P' : Z{?, E) and = {Xa^.E^) 


Fig. 17. Constraint Generation: Entailment Rules 



Execution rules 


Data-Assume Free 

C ^ P A (p Q C^P\~nA'P:E*xi-A [A, E] 

C ► {P} assuine((p) {<5} C ► {P} f ree(x) {P A ^ : P} 

Sequence 

Co ► {P} TTq {6} Cl ► {6} TTl {Q} 

Co;Ci ► {P} Tvo’yTvi {Q} 

Data-Load 

Co ► P\-{3X. n A$ : E*x^[A,E]) 

Cl ► (3X, a'. n[a /a] A ^[a'/a] A a = Ai[a /a]: {E * x i-A [A, E])[a /a]) h Q 
Co;Ci ► {P} a := x->Di {Q} 

Data-Assign 

C ► (3a'. n A <l>[a /a] A a = A\a!Ia\ : E[a /a] h Q) 

C ► {P A<?: P} a := A {Q} 

Data-Store 

Co ► Ph (3A. PA^ : P*a:i-A [A,P]) 

Cl ► (3A,a'.PA^Aa' = A:r*a:i-A [A[a'M»],^])l-Q 

Co;Ci ► {P} x->Di :=A{Q} 

Alloc 

C ► (3x', a, X. n[x'/x] A : E[x /x] * x [a, *]) h Q 
C ► {P A <P: E} X := new(n,m) {Q} 

Heap-Store 

C ► P: r h P' : P' * a; I-A- [A, P] 

C ► {P:P}x->N, ;= E {P' : P'*a: 1-^ [A,P[P/Pi]]} 

Consequence 

Cl ► P' h P Ca ► {P} c {Q} Ca ► Q h Q' 

Ci;C 2 ;C 3 ► {P'}c{Q'} 

Heap-Load 

C ► P: P h P' : P' * a; h-^ [A, P] 

C ► {P:P}y ■.= x->Ni{{3y'.n'[y'/y]Ay = Ei[y'/y]:{E'*x^[A,E])[y'/y])} 
Fig. 18. Constraint Generation: Execution Rules 



